Named Entity Resolution Using Automatically Extracted Semantic Information
نویسندگان
چکیده
One major problem in text mining and semantic retrieval is that detected entity mentions have to be assigned to the true underlying entity. The ambiguity of a name results from both the polysemy and synonymy problem, as the name of a unique entity may be written in variant ways and different unique entities may have the same name. The term “bush” for instance may refer to a woody plant, a mechanical fixing, a nocturnal primate, 52 persons and 8 places covered in Wikipedia and thousands of other persons. For the first time, according to our knowledge we apply a kernel entity resolution approach to the German Wikipedia as reference for named entities. We describe the context of named entities in Wikipedia and the context of a detected name phrase in a new document by a context vector of relevant features. These are designed from automatically extracted topic indicators generated by an LDA topic model. We use kernel classifiers, e.g. rank classifiers, to determine the right matching entity but also to detect uncovered entities. In comparison to a baseline approach using only text similarity the addition of topics approach gives a much higher f-value, which is comparable to the results published for English. It turns out that the procedure also is able to detect with high reliability if a person is not covered by the Wikipedia.
منابع مشابه
Corpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملExtracting Semantic Networks among Named Entities from Websites
To enable machine processing of webpages, it is important to identify the relationships among named entities. Named entities, like, people, organizations, and places are important pieces of information that must be extracted. The scale of the web indicates that manual extraction is not feasible. We propose a system that automatically constructs a semantic network of named entities from webpages...
متن کاملNamed Entity Recognition in Persian Text using Deep Learning
Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...
متن کاملTowards an Integrated Corpus for the Evaluation of Named Entity Recognition and Object Consolidation
When faced with the task of incorporating legacy web data from existing HTML pages into the Semantic Web (SW), a widespread approach is to use Information Extraction (IE) and Named Entity Recognition (NER) techniques. Natural language texts are annotated automatically or semi-automatically, and thus formal data is extracted from the texts. While this allows to add new sets of data to the SW, th...
متن کاملOntology Supported Automatic Generation of High-Quality Semantic Metadata
Large amounts of data in modern information systems, such as the World Wide Web, require innovative information retrieval techniques to effectively satisfy users’ information need. A promising approach is to exploit document semantics in the IR process. For this purpose, high-quality semantic metadata is needed. This paper introduces a method to automatically create semantic metadata by using o...
متن کامل